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XEROX 



PALO ALTO RESEARCH CENTER 

February 21, 1977 



to: All ALTO Users 

From: Ted Strollo and Bob Sproull 

subject: ALTO Main Memory Parity Errors 



The improper use of the two switches which affect memory parity error 
handling in the long INSTALL dialogues is dangerous. If you are a novice 
ALTO user and/or don't wish to take the time to read this memo, answer NO 
to both questions about handling memory parity errors. If you answer yes, 
you run the risk of clobbering your disk files and masking problems which 
may exist in your machine which will make it even harder for the 
maintainers to fix. 

This memo is to clarify for ALTO users some possible confusion which may 
exist about the ALTO operating system's handling of memory parity errors. 
In particular, several months ago, two switches were added to the INSTALL 
diaglogue for installing a new operating system. These switches enable the 
user to ignore parity errors and not have them reported over the Ethernet. We 
want to emphasize that if the user invokes these ignore operations, he does so 
at his own risk. In particular, the setting of the first switch to disable parity 
error detection does just exactly that. Parity errors which occur are ignored. 
This attempts to get the machine to limp along. The user could invoke the 
same action without setting this switch by typing control P to SWAT to 
attempt a proceed whenever an error occurs. Of course this requires much 
patience. The setting of this switch makes for some convenience, but the 
consequence is that the user has absolutely no idea that a parity error has 
occurred, and no information about parity errors from that machine gets sent 
to MAXC for analysis by the maintainers. A likely consequence is that with 
that switch left on, the machine will get sicker and finally display very 
anomalous behavior (such as crashing to SWAT, smashing disk files...) which 
finally alerts the user that his machine. is seriously ill. 

The switches are there in recognition of the fact that we cannot repair an 
ALTO memory problem in zero time, and in recognition of a particular 
problem within the ALTO II parity error detection logic which is described 
below. They are therefore expedient to use when a machine has a known 
memory problem which has been isolated but has not yet been fixed. We want 
to emphasize again that other uses of these switches are dangerous and are not 
endorsed by us. 



Types of ALTO Parity Errors 



ALTOs make basically three types of parity errors. These are: 

A) Regular parity errors where the location is found and reported. 

B) Phantom parity errors. These are errors reported to the hardware, but a 
sv/eep of memory does not find a location with bad parity. This is a problem 
with some ALTO I's which are believed to have electrical problems associated 
with the parity error computation circuitry itself. The second switch in the 
INSTALL dialogue about parity error detection/reporting allows you to select 
whether you want such errors reported over the Ethernet to MAXC. If you 
have an ALTO I with a chronic case of this problem, it may be reasonable for 
you to set this second switch since apparently little can be done to fix this 
problem. 

C) Garbled parity error information - this is a problem unique to the ALTO 
II and which is being fixed by an ECO to the machine which is in process. In 
this case a real parity error occurs and a sweep of memory finds the error, but 
the information about the location of the error is garbled and erroneously 
reported to the. hardware due to a timing problem in the hardware. 

ALTO II's are different from ALTO I's in that single bit errors can be 
corrected but double bit errors are detected without correction. You should be 
aware that the operating system sets up the error correction hardware for 
ALTO II's such that single bit errors are corrected with no attempt to report 
them. It is only when a double bit error interrupt occurs that error 
information is reported. There is currently no ALTO operating system switch 
to cause a single bit error interrupt to make a report to MAXC. You should 
therefore be aware that when you are using an ALTO II and SWAT reports a 
parity error, it is from a double bit error interrupt and it is just as risky to 
proceed from these error interrupts as when SWAT reports a parity error on 
an ALTO I. One other point of clarification about ALTO II's. When DMT is 
running on an ALTO II it reports all parity errors - single bit or otherwise. 

It was the discovery of the garbled parity error information type that 
triggered the events which led to the addition of these two new switches to 
INSTALL. This type of error is particularly frustrating to maintainers of the 
machine because it is a real error but the maintainer does not get any 
information about where the error is occuring. 

We hope this memo explains what is going on with ALTO parity errors and 
those INSTALL switches, but should you have additional questions please 
submit them to us. 



